6 Semidefinite optimization

您所在的位置：网站首页 › semidefinite positive › 6 Semidefinite optimization

6 Semidefinite optimization

2024-07-13 20:33:06| 来源: 网络整理| 查看: 265

6 Semidefinite optimization¶

In this chapter we extend the conic optimization framework introduced before with symmetric positive semidefinite matrix variables.

6.1 Introduction to semidefinite matrices¶ 6.1.1 Semidefinite matrices and cones¶

A symmetric matrix \(X\in \mathcal{S}^n\) is called symmetric positive semidefinite if

\[z^T X z \geq 0, \quad \forall z\in\mathbb{R}^n.\]

We then define the cone of symmetric positive semidefinite matrices as

(6.1)¶\[\PSD^n = \{ X \in \mathcal{S}^n \mid z^T X z \geq 0, \: \forall z\in \mathbb{R}^n \}.\]

For brevity we will often use the shorter notion semidefinite instead of symmetric positive semidefinite, and we will write \(X\succeq Y\) (\(X \preceq Y\)) as shorthand notation for \((X-Y)\in \PSD^n\) (\((Y-X)\in \PSD^n\)). As inner product for semidefinite matrices, we use the standard trace inner product for general matrices, i.e.,

\[\langle A, B \rangle := \mathrm{tr}(A^T B) = \sum_{ij}a_{ij}b_{ij}.\]

It is easy to see that (6.1) indeed specifies a convex cone; it is pointed (with origin \(X=0\)), and \(X,Y\in \PSD^n\) implies that \((\alpha X + \beta Y)\in \PSD^n, \: \alpha,\beta \geq 0\). Let us review a few equivalent definitions of \(\PSD^n\). It is well-known that every symmetric matrix \(A\) has a spectral factorization

\[A = \sum_{i=1}^n \lambda_i q_i q_i^T.\]

where \(q_i \in \mathbb{R}^n\) are the (orthogonal) eigenvectors and \(\lambda_i\) are eigenvalues of \(A\). Using the spectral factorization of \(A\) we have

\[x^T A x = \sum_{i=1}^n \lambda_i (x^T q_i)^2,\]

which shows that \(x^T A x \geq 0 \: \Leftrightarrow \: \lambda_i \geq 0, \: i=1,\dots,n\). In other words,

(6.3)¶\[\PSD^n = \{ X \in \Symm^n \mid \lambda_i(X)\geq 0, \: i=1,\dots,n \}.\]

Another useful characterization is that \(A\in \PSD^n\) if and only if it is a Grammian matrix \(A=V^T V\). Here \(V\) is called the Cholesky factor of \(A\). Using the Grammian representation we have

\[x^T A x = x^T V^T V x = \|V x\|_2^2,\]

i.e., if \(A=V^TV\) then \(x^T A x \geq 0\) for all \(x\). On the other hand, from the positive spectral factorization \(A=Q\Lambda Q^T\) we have \(A = V^T V\) with \(V=\Lambda^{1/2} Q^T\), where \(\Lambda^{1/2}=\Diag(\sqrt{\lambda_1}, \ldots, \sqrt{\lambda_n})\). We thus have the equivalent characterization

(6.4)¶\[\PSD^n = \{ X \in \PSD^n \mid X = V^T V \text{ for some } V\in\R^{n\times n}\}.\]

In a completely analogous way we define the cone of symmetric positive definite matrices as

\[\begin{split}\begin{array}{cl} \PD^n & = \{ X \in \Symm^n \mid z^T X z > 0, \: \forall z\in \R^n \} \\ & = \{ X \in \Symm^n \mid \lambda_i(X) > 0, \: i=1,\dots,n \} \\ & = \{ X \in \PSD^n \mid X = V^T V \text{ for some } V\in\R^{n\times n},\: \mathbf{rank}(V)=n\}, \end{array}\end{split}\]

and we write \(X\succ Y\) (\(X\prec Y\)) as shorthand notation for \((X-Y)\in \PD^n\) (\((Y-X)\in \PD^n\)).

The one dimensional cone \(\PSD^1\) simply corresponds to \(\R_+\). Similarly consider

\[\begin{split}X = \left[\begin{array}{cc} x_1 & x_3\\ x_3 & x_2 \end{array} \right]\end{split}\]

with determinant \(\det(X)=x_1 x_2 - x_3^2 = \lambda_1\lambda_2\) and trace \(\mathrm{tr}(X)=x_1 + x_2 = \lambda_1 + \lambda_2\). Therefore \(X\) has positive eigenvalues if and only if

\[x_1x_2 \geq x_3^2, \quad x_1, x_2\geq 0,\]

which characterizes a three-dimensional scaled rotated cone, i.e.,

\[\begin{split}\left[\begin{array}{cc} x_1 & x_3\\ x_3 & x_2 \end{array} \right] \in \PSD^2 \quad \Longleftrightarrow \quad (x_1,x_2,x_3\sqrt{2})\in \mathcal{Q}_r^3.\end{split}\]

More generally we have

\[x\in \R^n_+ \quad \Longleftrightarrow \quad \Diag(x) \in \PSD^n\]

and

\[\begin{split}(t,x)\in \mathcal{Q}^{n+1} \quad \Longleftrightarrow \quad \left[\begin{array}{cc} t & x^T \\ x & t I \end{array}\right] \in \PSD^{n+1},\end{split}\]

where the latter equivalence follows immediately from Lemma 6.1. Thus both the linear and quadratic cone are embedded in the semidefinite cone. In practice, however, linear and quadratic cones should never be described using semidefinite constraints, which would result in a large performance penalty by squaring the number of variables.

Example 6.1

As a more interesting example, consider the symmetric matrix

(6.5)¶\[\begin{split}A(x,y,z) = \left[ \begin{array}{ccc} 1 & x & y\\ x & 1 & z\\ y & z & 1 \end{array} \right]\end{split}\]

parametrized by \((x,y,z)\). The set

\[S = \{ (x,y,z) \in \R^3 \mid A(x,y,z) \in \PSD^3 \},\]

(shown in Fig. 6.1) is called a spectrahedron and is perhaps the simplest bounded semidefinite representable set, which cannot be represented using (finitely many) linear or quadratic cones. To gain a geometric intuition of \(S\), we note that

\[\det(A(x,y,z)) = -(x^2 + y^2 + z^2 - 2xyz - 1),\]

so the boundary of \(S\) can be characterized as

\[x^2 + y^2 + z^2 - 2xyz = 1,\]

or equivalently as

\[\begin{split}\left[\begin{array}{c} x\\y \end{array}\right]^T \left[\begin{array}{rr} 1 & -z\\-z &1 \end{array}\right] \left[\begin{array}{c} x\\y \end{array}\right] = 1 - z^2.\end{split}\]

For \(z=0\) this describes a circle in the \((x,y)\)-plane, and for \(-1\leq z \leq 1\) it characterizes an ellipse (for a fixed \(z\)).

Fig. 6.1 Plot of spectrahedron \(S=\{ (x,y,z) \in \R^3 \mid A(x,y,z)\succeq 0 \}\).¶

6.1.2 Properties of semidefinite matrices¶

Many useful properties of (semi)definite matrices follow directly from the definitions (6.1)-(6.4) and their definite counterparts.

The diagonal elements of \(A\in\PSD^n\) are nonnegative. Let \(e_i\) denote the \(i\)th standard basis vector (i.e., \([e_i]_j=0,\: j\neq i\), \([e_i]_i=1\)). Then \(A_{ii}=e_i^T A e_i\), so (6.1) implies that \(A_{ii}\geq 0\).

A block-diagonal matrix \(A=\Diag(A_1,\dots A_p)\) is (semi)definite if and only if each diagonal block \(A_i\) is (semi)definite.

Given a quadratic transformation \(M:=B^T A B\), \(M\succ 0\) if and only if \(A\succ 0\) and \(B\) has full rank. This follows directly from the Grammian characterization \(M=(VB)^T(VB)\). For \(M\succeq 0\) we only require that \(A\succeq 0\). As an example, if \(A\) is (semi)definite then so is any permutation \(P^T A P\).

Any principal submatrix of \(A\in \PSD^n\) (\(A\) restricted to the same set of rows as columns) is positive semidefinite; this follows by restricting the Grammian characterization \(A=V^T V\) to a submatrix of \(V\).

The inner product of positive (semi)definite matrices is positive (nonnegative). For any \(A,B\in \PD^n\) let \(A=U^TU\) and \(B=V^T V\) where \(U\) and \(V\) have full rank. Then

\[\langle A, B \rangle = \mathrm{tr}(U^T U V^T V) = \|U V^T \|_F^2 > 0,\]

where strict positivity follows from the assumption that \(U\) has full column-rank, i.e., \(UV^T \neq 0\).

The inverse of a positive definite matrix is positive definite. This follows from the positive spectral factorization \(A=Q \Lambda Q^T\), which gives us

\[A^{-1} = Q^T \Lambda^{-1} Q\]

where \(\Lambda_{ii}>0\). If \(A\) is semidefinite then the pseudo-inverse \(A^\dagger\) of \(A\) is semidefinite.

Consider a matrix \(X\in\Symm^n\) partitioned as

\[\begin{split}X = \left[\begin{array}{cc} A & B^T\\ B & C \end{array}\right].\end{split}\]

Let us find necessary and sufficient conditions for \(X\succ 0\). We know that \(A\succ 0\) and \(C\succ 0\) (since any principal submatrix must be positive definite). Furthermore, we can simplify the analysis using a nonsingular transformation

\[\begin{split}L = \left[\begin{array}{cc} I & 0\\ F & I \end{array}\right]\end{split}\]

to diagonalize \(X\) as \(L X L^T = D\), where \(D\) is block-diagonal. Note that \(\det(L)=1\), so \(L\) is indeed nonsingular. Then \(X\succ 0\) if and only if \(D\succ 0\). Expanding \(L X L^T = D\), we get

\[\begin{split}\left[\begin{array}{cc} A & AF^T + B^T\\ FA + B & F A F^T + F B^T + B F^T + C \end{array}\right] = \left[\begin{array}{cc} D_1 & 0\\ 0 & D_2 \end{array}\right].\end{split}\]

Since \(\det(A)\neq 0\) (by assuming that \(A\succ 0\)) we see that \(F=-BA^{-1}\) and direct substitution gives us

\[\begin{split}\left[\begin{array}{cc} A & 0\\ 0 & C - B A^{-1} B^T \end{array}\right] = \left[\begin{array}{cc} D_1 & 0\\ 0 & D_2 \end{array}\right].\end{split}\]

In the last part we have thus established the following useful result.

Lemma 6.1 Schur complement lemma

A symmetric matrix

\[\begin{split}X = \left[\begin{array}{cc} A & B^T\\ B & C \end{array}\right].\end{split}\]

is positive definite if and only if

\[A \succ 0, \quad C - B A^{-1} B^T \succ 0.\] 6.2 Semidefinite modeling¶

Having discussed different characterizations and properties of semidefinite matrices, we next turn to different functions and sets that can be modeled using semidefinite cones and variables. Most of those representations involve semidefinite matrix-valued affine functions, which we discuss next.

6.2.1 Linear matrix inequalities¶

Consider an affine matrix-valued mapping \(A:\R^n \mapsto \mathcal{S}^m\):

(6.8)¶\[A(x) = A_0 + x_1 A_1 + \dots + x_n A_n.\]

A linear matrix inequality (LMI) is a constraint of the form

(6.9)¶\[A_0 + x_1 A_1 + \dots + x_n A_n \succeq 0\]

in the variable \(x\in \R^n\) with symmetric coefficients \(A_i\in\mathcal{S}^m,\: i=0,\dots,n\). As a simple example consider the matrix in (6.5),

\[A(x,y,z) = A_0 + x A_1 + y A_2 + z A_3 \succeq 0\]

with

\[\begin{split}\begin{array}{cccc} A_0 = I, & A_1 = \left[ \begin{array}{ccc} 0 & 1 & 0\\ 1 & 0 & 0\\0 & 0 &0 \end{array}\right], & A_2 = \left[ \begin{array}{ccc} 0 & 0 & 1\\ 0 & 0 & 0\\1 & 0 &0 \end{array}\right], & A_3 = \left[\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & 1\\0 & 1 &0 \end{array}\right]. \end{array}\end{split}\]

Alternatively, we can describe the linear matrix inequality \(A(x,y,z)\succeq 0\) as

\[X\in \PSD^3, \qquad X_{11} = X_{22} = X_{33}=1, \qquad x=X_{21},\quad y=X_{31},\quad z=X_{32},\]

i.e., as a semidefinite variable with fixed diagonal; these two alternative formulations illustrate the difference between primal and dual form of semidefinite problems, see Sec. 8.6 (Semidefinite duality and LMIs).

6.2.2 Eigenvalue optimization¶

Consider a symmetric matrix \(A\in \mathcal{S}^m\) and let its eigenvalues be denoted by

\[\lambda_1(A) \geq \lambda_2(A) \geq \dots \geq \lambda_m(A).\]

A number of different functions of \(\lambda_i\) can then be described using a mix of linear and semidefinite constraints.

Sum of eigenvalues

The sum of the eigenvalues corresponds to

\[\sum_{i=1}^m \lambda_i (A) = \mathrm{tr}(A).\]

Largest eigenvalue

The largest eigenvalue can be characterized in epigraph form \(\lambda_1(A)\leq t\) as

(6.10)¶\[tI - A \succeq 0.\]

To verify this, suppose we have a spectral factorization \(A=Q \Lambda Q^T\) where \(Q\) is orthogonal and \(\Lambda\) is diagonal. Then \(t\) is an upper bound on the largest eigenvalue if and only if

\[Q^T(tI - A)Q = t I - \Lambda \succeq 0.\]

Thus we can minimize the largest eigenvalue of \(A\).

Smallest eigenvalue

The smallest eigenvalue can be described in hypograph form \(\lambda_m(A) \geq t\) as

(6.11)¶\[A \succeq tI,\]

i.e., we can maximize the smallest eigenvalue of \(A\).

Eigenvalue spread

The eigenvalue spread can be modeled in epigraph form

\[\lambda_1(A)-\lambda_m(A)\leq t\]

by combining the two linear matrix inequalities in (6.10) and (6.11), i.e.,

(6.12)¶\[\begin{split}\begin{array}{r} zI \preceq A \preceq sI, \\ s-z \leq t. \end{array}\end{split}\]

Spectral radius

The spectral radius \(\rho(A):=\max_i |\lambda_i(A) |\) can be modeled in epigraph form \(\rho(A)\leq t\) using two linear matrix inequalities

\[-tI \preceq A \preceq tI.\]

Condition number of a positive definite matrix

Suppose now that \(A\in\PSD^m\). The condition number of a positive definite matrix can be minimized by noting that \(\lambda_1(A)/\lambda_m(A)\leq t\) if and only if there exists a \(\mu>0\) such that

\[\mu I \preceq A \preceq \mu t I,\]

or equivalently if and only if \(I \preceq \mu^{-1}A \preceq t I\). If \(A=A(x)\) is represented as in (6.8) then a change of variables \(z:=x/\mu\), \(\nu:=1/\mu\) leads to a problem of the form

\begin{equation} \begin{array}{lc} \mbox{minimize} & t \\ \mbox{subject to} & I \preceq \nu A_0 + \sum_{i=1}^m z_i A_i \preceq t I, \end{array} \end{equation}

from which we recover the solution \(x=z/\nu\). In essence, we first normalize the spectrum by the smallest eigenvalue, and then minimize the largest eigenvalue of the normalized linear matrix inequality. Compare Sec. 2.2.5 (Homogenization).

6.2.3 Log-determinant¶

Consider again a symmetric positive-definite matrix \(A\in\PSD^m\). The determinant

\[\det(A) = \prod_{i=1}^m \lambda_i(A)\]

is neither convex or concave, but \(\log \det(A)\) is concave and we can write the inequality

\[t \leq \log \det(A)\]

in the form of the following problem:

(6.13)¶\[\begin{split}\begin{array}{l} \left[\begin{array}{cc} A & Z \\ Z^T & \diag(Z) \end{array}\right] \succeq 0, \\ Z\ \mathrm{is}\ \mathrm{lower}\ \mathrm{triangular}, \\ t\leq \sum_i \log{Z_{ii}}. \end{array}\end{split}\]

The equivalence of the two problems follows from Lemma 6.1 and subadditivity of determinant for semidefinite matrices. That is:

\[\begin{split}\begin{array}{rl} 0\leq \det(A-Z^T\diag(Z)^{-1}Z) &\leq \det(A)-\det(Z^T\diag(Z)^{-1}Z) \\ & =\det(A)-\det(Z). \end{array}\end{split}\]

On the other hand the optimal value \(\det(A)\) is attained for \(Z=LD\) if \(A=LDL^T\) is the LDL factorization of \(A\).

The last inequality in problem (6.13) can of course be modeled using the exponential cone as in Sec. 5.2 (Modeling with the exponential cone). Note that we can replace that bound with \(t\leq (\prod_i Z_{ii})^{1/m}\) to get instead the model of \(t\leq \det(A)^{1/m}\) using Sec. 4.2.4 (Geometric mean).

6.2.4 Singular value optimization¶

We next consider a non-square matrix \(A\in\R^{m\times p}\). Assume \(p\leq m\) and denote the singular values of \(A\) by

\[\sigma_1(A) \geq \sigma_2(A) \geq \dots \geq \sigma_p(A) \geq 0.\]

The singular values are connected to the eigenvalues of \(A^T A\) via

(6.14)¶\[\sigma_i(A) = \sqrt{\lambda_i(A^T A)},\]

and if \(A\) is square and symmetric then \(\sigma_i(A)=|\lambda_i(A)|\). We show next how to optimize several functions of the singular values.

Largest singular value

The epigraph \(\sigma_1(A)\leq t\) can be characterized using (6.14) as

\[A^T A \preceq t^2 I,\]

which from Schur’s lemma is equivalent to

(6.15)¶\[\begin{split} \left[\begin{array}{cc} tI & A \\ A^T & t I \end{array} \right]\succeq 0.\end{split}\]

The largest singular value \(\sigma_1(A)\) is also called the spectral norm or the \(\ell_2\)-norm of \(A\), \(\|A\|_2 := \sigma_1(A)\).

Sum of singular values

The trace norm or the nuclear norm of \(X\) is the dual of the \(\ell_2\)-norm:

(6.16)¶\[ \| X \|_{*}=\sup_{\| Z \|_2\leq 1}\mathrm{tr}(X^T Z).\]

It turns out that the nuclear norm corresponds to the sum of the singular values,

(6.17)¶\[ \| X \|_{*} = \sum_{i=1}^m \sigma_i(X) = \sum_{i=1}^n \sqrt{\lambda_i(X^TX)},\]

which is easy to verify using singular value decomposition \(X=U \Sigma V^T\). We have

\[\begin{split}\begin{aligned} \sup_{\|Z\|_2\leq 1} \mathrm{tr}(X^T Z) & = \sup_{\|Z\|_2\leq 1} \mathrm{tr}(\Sigma^T U^T ZV)\\ & = \sup_{\| Y\|_2\leq 1} \mathrm{tr}(\Sigma^T Y)\\ & = \sup_{|y_i| \leq 1} \sum_{i=1}^p \sigma_i y_i = \sum_{i=1}^p \sigma_i. \end{aligned}\end{split}\]

which shows (6.17). Alternatively, we can express (6.16) as the solution to

(6.18)¶\[\begin{split}\begin{array}{ll} \mbox{maximize} & \mathrm{tr}(X^T Z) \\ \mbox{subject to} & \left[\begin{array}{cc}I & Z^T\\Z & I\end{array}\right] \succeq 0, \end{array}\end{split}\]

with the dual problem (see Example 8.8)

(6.19)¶\[\begin{split}\begin{array}{ll} \mbox{minimize} & \mathrm{tr}(U + V)/2 \\ \mbox{subject to} & \left[\begin{array}{cc}U & X^T\\X & V\end{array}\right] \succeq 0. \end{array}\end{split}\]

In other words, using strong duality we can characterize the epigraph \(\| A \|_{*}\leq t\) with

(6.20)¶\[\begin{split}\left[\begin{array}{cc} U & A^T \\ A & V\end{array}\right] \succeq 0, \quad \mathrm{tr}(U + V)/2 \leq t.\end{split}\]

For a symmetric matrix the nuclear norm corresponds to the sum of absolute values of eigenvalues, and for a semidefinite matrix it simply corresponds to the trace of the matrix.

6.2.5 Matrix inequalities from Schur’s Lemma¶

Several quadratic or quadratic-over-linear matrix inequalities follow immediately from Schur’s lemma. Suppose \(A: \R^{m\times p}\) and \(B: \R^{p\times p}\) are matrix variables. Then

\[A^T B^{-1} A \preceq C\]

if and only if

\[\begin{split}\left[\begin{array}{cl} C & A^T \\ A & B \end{array} \right] \succeq 0.\end{split}\] 6.2.6 Nonnegative polynomials¶

We next consider characterizations of polynomials constrained to be nonnegative on the real axis. To that end, consider a polynomial basis function

\[v(t) = \left(1, t, \dots, t^{2n}\right).\]

It is then well-known that a polynomial \(f:\R\mapsto \R\) of even degree \(2n\) is nonnegative on the entire real axis

(6.21)¶\[f(t) := x^T v(t) = x_0 + x_1 t + \dots + x_{2n} t^{2n} \geq 0, \: \forall t\]

if and only if it can be written as a sum of squared polynomials of degree \(n\) (or less), i.e., for some \(q_1,q_2\in \R^{n+1}\)

(6.22)¶\[f(t) = (q_1^T u(t))^2 + (q_2^T u(t))^2, \qquad u(t):=(1,t,\dots,t^n).\]

It turns out that an equivalent characterization of \(\{ x \mid x^Tv(t)\geq 0, \: \forall t\}\) can be given in terms of a semidefinite variable \(X\),

(6.23)¶\[x_i = \langle X, H_i\rangle, \quad i=0,\dots,2n, \qquad X \in \PSD^{n+1}.\]

where \(H^{n+1}_i\in \R^{(n+1)\times (n+1)}\) are Hankel matrices

\[\begin{split}[H_i]_{kl} = \left \{\begin{array}{ll}1, & k+l=i,\\0, &\text{otherwise}.\end{array} \right.\end{split}\]

When there is no ambiguity, we drop the superscript on \(H_i\). For example, for \(n=2\) we have

\[\begin{split}H_0 = \left[\begin{array}{ccc} 1 & 0 & 0\\0 & 0 & 0\\0 & 0 & 0 \end{array} \right], \quad H_1 = \left[\begin{array}{ccc} 0 & 1 & 0\\1 & 0 & 0\\0 & 0 & 0 \end{array} \right], \quad \dots \quad H_4 = \left[\begin{array}{ccc} 0 & 0 & 0\\0 & 0 & 0\\0 & 0 & 1 \end{array} \right].\end{split}\]

To verify that (6.21) and (6.23) are equivalent, we first note that

\[u(t)u(t)^T = \sum_{i=0}^{2n} H_i v_i(t),\]

i.e.,

\[\begin{split}\left[\begin{array}{c} 1\\ t \\ \vdots \\t^n \end{array}\right] \left[\begin{array}{c} 1\\ t \\ \vdots \\t^n \end{array}\right]^T = \left[\begin{array}{cccc} 1 & t & \dots & t^n\\ t & t^2 & \dots & t^{n+1}\\ \vdots & \vdots & \ddots & \vdots \\ t^n & t^{n+1} & \dots & t^{2n} \end{array}\right].\end{split}\]

Assume next that \(f(t)\geq 0\). Then from (6.22) we have

\[\begin{split}\begin{array}{cl} f(t) & = (q_1^T u(t))^2 + (q_2^T u(t))^2 \\ & = \langle q_1q_1^T + q_2 q_2^T, u(t)u(t)^T \rangle\\ & = \sum_{i=0}^{2n} \langle q_1q_1^T + q_2 q_2^T, H_i \rangle v_i(t), \end{array}\end{split}\]

i.e., we have \(f(t)=x^T v(t)\) with \(x_i = \langle X, H_i \rangle, \: X = (q_1q_1^T + q_2 q_2^T)\succeq 0\). Conversely, assume that (6.23) holds. Then

\[f(t) = \sum_{i=0}^{2n} \langle H_i, X\rangle v_i(t) = \langle X , \sum_{i=0}^{2n}H_i v_i(t)\rangle = \langle X, u(t) u(t)^T\rangle \geq 0\]

since \(X\succeq 0\). In summary, we can characterize the cone of nonnegative polynomials over the real axis as

(6.24)¶\[K^n_{\infty} = \{ x\in \R^{n+1} \mid x_i = \langle X, H_i\rangle, \: i=0,\dots,2n, \: X\in\PSD^{n+1} \}.\]

Checking nonnegativity of a univariate polynomial thus corresponds to a semidefinite feasibility problem.

6.2.6.1 Nonnegativity on a finite interval¶

As an extension we consider a basis function of degree \(n\),

\[v(t)=(1,t,\ldots,t^n).\]

A polynomial \(f(t):=x^T v(t)\) is then nonnegative on a subinterval \(I=[a,b]\subset \R\) if and only \(f(t)\) can be written as a sum of weighted squares,

\[f(t) = w_1(t)(q_1^T u_1(t))^2 + w_2(t)(q_2^T u_2(t))^2\]

where \(w_i(t)\) are polynomials nonnegative on \([a,b]\). To describe the cone

\[K^n_{a,b} = \{ x\in \R^{n+1} \mid f(t) = x^T v(t) \geq 0, \: \forall t\in [a, b] \}\]

we distinguish between polynomials of odd and even degree.

Even degree. Let \(n=2m\) and denote

\[u_1(t)=(1,t,\ldots,t^m), \qquad u_2(t)=(1,t,\ldots,t^{m-1}).\]

We choose \(w_1(t)=1\) and \(w_2(t)=(t-a)(b-t)\) and note that \(w_2(t)\geq 0\) on \([a,b]\). Then \(f(t)\geq 0, \forall t\in [a, b]\) if and only if

\[f(t) = (q_1^T u_1(t))^2 + w_2(t)(q_2^T u_2(t))^2\]

for some \(q_1, q_2\), and an equivalent semidefinite characterization can be found as

(6.25)¶\[\begin{split} K^n_{a,b} = \{ x\in \R^{n+1} \mid x_i = \langle X_1, H^m_i\rangle + \langle X_2, (a+b)H^{m-1}_{i-1}-ab H^{m-1}_i - H^{m-1}_{i-2}\rangle, \\ \, i=0,\ldots,n, \, X_1 \in \PSD^m, \, X_2 \in \PSD^{m-1} \}.\end{split}\]

Odd degree. Let \(n=2m+1\) and denote \(u(t)=(1,t,\ldots,t^m)\). We choose \(w_1(t)=(t-a)\) and \(w_2(t)=(b-t)\). We then have that \(f(t)=x^T v(t)\geq 0, \forall t\in[a,b]\) if and only if

\[f(t) = (t-a)(q_1^T u(t))^2 + (b-t)(q_2^T u(t))^2\]

for some \(q_1, q_2\), and an equivalent semidefinite characterization can be found as

(6.26)¶\[\begin{split}K^n_{a,b} = \{ x\in \R^{n+1} \mid x_i = \langle X_1, H^m_{i-1} - a H^m_i\rangle + \langle X_2, bH^m_i- H^m_{i-1}\rangle, \\ \, i=0,\ldots,n, \, X_1, X_2 \in \PSD^m \}.\end{split}\] 6.2.7 Hermitian matrices¶

Semidefinite optimization can be extended to complex-valued matrices. To that end, let \(\mathcal{H}^n\) denote the cone of Hermitian matrices of order \(n\), i.e.,

(6.27)¶\[X \in \mathcal{H}^n \qquad \Longleftrightarrow \qquad X\in \mathbb{C}^{n\times n}, \quad X^H = X,\]

where superscript ’\(H\)’ denotes Hermitian (or complex) transposition. Then \(X\in \mathcal{H}_+^n\) if and only if

\[\begin{split}\begin{array}{cl} z^H X z & = (\Re z - i \Im z)^T (\Re X + i\Im X) (\Re z + i \Im z)\\ & = \left[\begin{array}{c}\Re z\\ \Im z\end{array}\right]^T \left[ \begin{array}{rr} \Re X & -\Im X\\ \Im X & \Re X \end{array} \right] \left[ \begin{array}{c} \Re z\\ \Im z \end{array} \right] \geq 0, \quad \forall z \in \mathbb{C}^n. \end{array}\end{split}\]

In other words,

(6.28)¶\[\begin{split} X \in \mathcal{H}_+^n \qquad \Longleftrightarrow \qquad \left[\begin{array}{rr} \Re X & -\Im X\\ \Im X & \Re X \end{array} \right]\in \PSD^{2n}.\end{split}\]

Note that (6.27) implies skew-symmetry of \(\Im X\), i.e., \(\Im X = -\Im X^T\).

6.2.8 Nonnegative trigonometric polynomials¶

As a complex-valued variation of the sum-of-squares representation we consider trigonometric polynomials; optimization over cones of nonnegative trigonometric polynomials has several important engineering applications. Consider a trigonometric polynomial evaluated on the complex unit-circle

(6.29)¶\[ f(z) = x_0 + 2\Re ( \sum_{i=1}^n x_i z^{-i} ), \qquad |z|=1\]

parametrized by \(x\in \R\times \mathbb{C}^n\). We are interested in characterizing the cone of trigonometric polynomials that are nonnegative on the angular interval \([0,\pi]\),

\[K^n_{0,\pi} = \{ x\in \R\times \mathbb{C}^{n} \mid x_0 + 2\Re ( \sum_{i=1}^n x_i z^{-i} )\geq 0, \: \forall z=e^{jt}, t\in[0,\pi] \}.\]

Consider a complex-valued basis function

\[v(z) = (1,z,\dots,z^n).\]

The Riesz-Fejer Theorem states that a trigonometric polynomial \(f(z)\) in (6.29) is nonnegative (i.e., \(x \in K_{0,\pi}^n\)) if and only if for some \(q\in \mathbb{C}^{n+1}\)

(6.30)¶\[f(z) = | q^H v(z) |^2.\]

Analogously to Sec. 6.2.6 (Nonnegative polynomials) we have a semidefinite characterization of the sum-of-squares representation, i.e., \(f(z)\geq 0,\ \forall z=e^{jt},\ t\in[0,2\pi]\) if and only if

(6.31)¶\[x_i = \langle X, T_i\rangle, \quad i=0,\dots,n, \qquad X\in\mathcal{H}_+^{n+1}\]

where \(T_i^{n+1}\in \R^{(n+1)\times (n+1)}\) are Toeplitz matrices

\[\begin{split}[T_i]_{kl} = \left \{\begin{array}{ll}1, & k-l=i\\0, &\text{otherwise}\end{array} \right., \quad i=0,\dots,n.\end{split}\]

When there is no ambiguity, we drop the superscript on \(T_i\). For example, for \(n=2\) we have

\[\begin{split}T_0 = \left[\begin{array}{ccc} 1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1 \end{array} \right], \quad T_1 = \left[\begin{array}{ccc} 0 & 0 & 0\\1 & 0 & 0\\0 & 1 & 0 \end{array} \right], \quad T_2 = \left[\begin{array}{ccc} 0 & 0 & 0\\0 & 0 & 0\\1 & 0 & 0 \end{array} \right].\end{split}\]

To prove correctness of the semidefinite characterization, we first note that

\[v(z)v(z)^H = I + \sum_{i=1}^n \left(T_i v_i(z)\right ) + \sum_{i=1}^n \left( T_i v_i(z)\right )^H\]

i.e.,

\[\begin{split}\left[ \begin{array}{c} 1\\ z \\ \vdots \\z ^n \end{array} \right] \left[ \begin{array}{c} 1\\ z \\ \vdots \\z^n \end{array} \right]^H = \left[ \begin{array}{cccc} 1 & z^{-1} & \dots & z^{-n}\\ z & 1 & \dots & z^{1-n}\\ \vdots & \vdots & \ddots & \vdots \\ z^n & z^{n-1} & \dots & 1 \end{array} \right].\end{split}\]

Next assume that (6.30) is satisfied. Then

\[\begin{split}\begin{array}{cl} f(z) & = \langle qq^H, v(z)v(z)^H \rangle\\ & = \langle qq^H, I\rangle + \langle qq^H, \sum_{i=1}^n T_i v_i(z)\rangle + \langle qq^H, \sum_{i=1}^n T_i^T \overline{v_i(z)}\rangle\\ & = \langle qq^H, I\rangle + \sum_{i=1}^n\langle qq^H, T_i \rangle v_i(z) + \sum_{i=1}^n\langle qq^H, T_i^T \rangle \overline{v_i(z)}\\ & = x_0 + 2\Re ( \sum_{i=1}^n x_i v_i(z) ) \end{array}\end{split}\]

with \(x_i = \langle qq^H, T_i \rangle\). Conversely, assume that (6.31) holds. Then

\[f(z) = \langle X, I\rangle + \sum_{i=1}^n\langle X, T_i \rangle v_i(z) + \sum_{i=1}^n\langle X, T_i^T \rangle \overline{v_i(z)} = \langle X, v(z) v(z)^H \rangle \geq 0.\]

In other words, we have shown that

(6.32)¶\[K^n_{0,\pi} = \{ x\in \R\times \mathbb{C}^n \mid x_i = \langle X, T_i\rangle, \: i=0,\dots,n, \: X\in\mathcal{H}_+^{n+1} \}.\] 6.2.8.1 Nonnegativity on a subinterval¶

We next sketch a few useful extensions. An extension of the Riesz-Fejer Theorem states that a trigonometric polynomial \(f(z)\) of degree \(n\) is nonnegative on \(I(a,b)=\{ z \mid z=e^{jt}, \: t\in [a,b]\subseteq [0,\pi] \}\) if and only if it can be written as a weighted sum of squared trigonometric polynomials

\[f(z) = | f_1(z) |^2 + g(z) | f_2(z) |^2\]

where \(f_1, f_2, g\) are trigonemetric polynomials of degree \(n, n-d\) and \(d\), respectively, and \(g(z)\geq 0\) on \(I(a,b)\). For example \(g(z) = z + z^{-1} - 2\cos \alpha\) is nonnegative on \(I(0,\alpha)\), and it can be verified that \(f(z)\geq 0, \: \forall z\in I(0,\alpha)\) if and only if

\[x_i = \langle X_1, T_i^{n+1} \rangle + \langle X_2 , T_{i+1}^n \rangle + \langle X_2 , T_{i-1}^n \rangle - 2\cos(\alpha)\langle X_2 , T_i^n\rangle,\]

for \(X_1\in \mathcal{H}_+^{n+1}\), \(X_2\in\mathcal{H}_+^n\), i.e.,

(6.33)¶\[\begin{split} K^n_{0,\alpha} = \{ x\in \R\times \mathbb{C}^n \mid x_i = \langle X_1, T_i^{n+1} \rangle + \langle X_2 , T_{i+1}^n \rangle + \langle X_2 , T_{i-1}^n \rangle \\ - 2\cos(\alpha)\langle X_2 , T_i^n\rangle, \: X_1\in \mathcal{H}_+^{n+1}, X_2\in\mathcal{H}_+^n \}.\end{split}\]

Similarly \(f(z)\geq 0, \: \forall z \in I(\alpha,\pi)\) if and only if

\[x_i = \langle X_1, T_i^{n+1} \rangle + \langle X_2 , T_{i+1}^n \rangle + \langle X_2 , T_{i-1}^n \rangle -2\cos(\alpha)\langle X_2 , T_i^n\rangle\]

i.e.,

(6.34)¶\[\begin{split} K^n_{\alpha,\pi} = \{ x\in \R\times \mathbb{C}^n \mid x_i = \langle X_1, T_i^{n+1} \rangle + \langle X_2 , T_{i+1}^n \rangle + \langle X_2 , T_{i-1}^n \rangle \\ +2\cos(\alpha)\langle X_2 , T_i^n\rangle, \: X_1\in \mathcal{H}_+^{n+1}, X_2\in\mathcal{H}_+^n \}.\end{split}\] 6.3 Semidefinite optimization case studies¶ 6.3.1 Nearest correlation matrix¶

We consider the set

\[S = \{ X \in \PSD^n \mid X_{ii}=1, \: i=1,\dots,n \}\]

(shown in Fig. 6.1 for \(n=3\)). For \(A\in \mathcal{S}^n\) the nearest correlation matrix is

\[X^\star = \arg \min_{X\in S} \| A - X\|_F,\]

i.e., the projection of \(A\) onto the set \(S\). To pose this as a conic optimization we define the linear operator

\[\mathbf{svec}(U) = (U_{11},\sqrt{2}U_{21},\dots,\sqrt{2}U_{n1}, U_{22},\sqrt{2}U_{32},\dots,\sqrt{2}U_{n2},\dots,U_{nn}),\]

which extracts and scales the lower-triangular part of \(U\). We then get a conic formulation of the nearest correlation problem exploiting symmetry of \(A-X\),

(6.35)¶\[\begin{split}\begin{array}{lrl} \mbox{minimize} & t &\\ \mbox{subject to} & \| z \|_2 & \leq t,\\ & \mathbf{svec}(A-X) & = z,\\ & \diag(X) & = e,\\ & X \succeq 0. \end{array}\end{split}\]

This is an example of a problem with both conic quadratic and semidefinite constraints in primal form. We can add different constraints to the problem, for example a bound \(\gamma\) on the smallest eigenvalue by replacing \(X\succeq 0\) with \(X\succeq \gamma I\).

6.3.2 Extremal ellipsoids¶

Given a polytope we can find the largest ellipsoid contained in the polytope, or the smallest ellipsoid containing the polytope (for certain representations of the polytope).

6.3.2.1 Maximal inscribed ellipsoid¶

Consider a polytope

\[S = \{ x \in \R^n \: | \: a_i^T x \leq b_i, \: i=1,\dots,m \}.\]

The ellipsoid

\[\mathcal{E}:=\{ x \: | \: x=Cu + d, \: \|u\|\leq 1\}\]

is contained in \(S\) if and only if

\[\max_{\| u\|_2\leq 1}a_i^T(Cu + d) \leq b_i, \quad i=1,\dots,m\]

or equivalently, if and only if

\[\| C a_i\|_2 + a_i^T d \leq b_i, \quad i=1,\dots,m.\]

Since \(\mathbf{Vol}({\mathcal{E}})\approx \det(C)^{1/n}\) the maximum-volume inscribed ellipsoid is the solution to

\[\begin{split}\begin{array}{lll} \mbox{maximize} & \det(C) &\\ \mbox{subject to} & \| C a_i\|_2 + a_i^T d \leq b_i, & i=1,\dots,m,\\ & C \succeq 0. & \end{array}\end{split}\]

In Sec. 6.2.2 (Eigenvalue optimization) we show how to maximize the determinant of a positive definite matrix.

6.3.2.2 Minimal enclosing ellipsoid¶

Next consider a polytope given as the convex hull of a set of points,

\[S'=\mathbf{conv}\{ x_1, x_2, \dots, x_m \}, \quad x_i\in \R^n.\]

The ellipsoid

\[\mathcal{E}' := \{ x \: | \: \|Px-c\|_2 \leq 1 \}\]

has \(\mathbf{Vol}({\mathcal{E'}})\approx \det(P)^{-1/n}\), so the minimum-volume enclosing ellipsoid is the solution to

\[\begin{split}\begin{array}{lll} \mbox{maximize} & \det(P) &\\ \mbox{subject to} & \| Px_i-c\|_2 \leq 1, & i=1,\ldots,m,\\ & P \succeq 0. & \end{array}\end{split}\]

Fig. 6.2 Example of inner and outer ellipsoidal approximations of a pentagon in \(\real^2\).¶

6.3.3 Polynomial curve-fitting¶

Consider a univariate polynomial of degree \(n\),

\[f(t) = x_0 + x_1 t + x_2 t^2 + \dots + x_n t^n.\]

Often we wish to fit such a polynomial to a given set of measurements or control points

\[\{ (t_1, y_1), (t_2, y_2), \ldots, (t_m, y_m) \},\]

i.e., we wish to determine coefficients \(x_i, \, i=0,\ldots,n\) such that

\[f(t_j) \approx y_j, \quad j=1,\ldots,m.\]

To that end, define the Vandermonde matrix

\[\begin{split}A = \left [ \begin{array}{ccccc} 1 & t_1 & t_1^2 & \dots & t_1^n\\ 1 & t_2 & t_2^2 & \dots & t_2^n\\ \vdots & \vdots & \vdots & & \vdots\\ 1 & t_m & t_m^2 & \dots & t_m^n\\ \end{array} \right ].\end{split}\]

We can then express the desired curve-fit compactly as

\[A x \approx y,\]

i.e., as a linear expression in the coefficients \(x\). When the degree of the polynomial equals the number measurements, \(n=m\), the matrix \(A\) is square and non-singular (provided there are no duplicate rows), so we can can solve

\[Ax = y\]

to find a polynomial that passes through all the control points \((t_i, y_i)\). Similarly, if \(n>m\) there are infinitely many solutions satisfying the underdetermined system \(Ax=y\). A typical choice in that case is the least-norm solution

\[x_{\mathrm{ln}} = \arg \min_{Ax=y} \| x \|_2\]

which (assuming again there are no duplicate rows) equals

\[x_{\mathrm{ln}} = A^T (AA^T)^{-1} y.\]

On the other hand, if \(n

【本文地址】

公司简介

联系我们

今日新闻

semidefinite positive

点击排行

实验室常用的仪器、试剂和: 说到实验室常用到的东西，主要就分为仪器、试剂和耗

不用再找了，全球10大实验: 01、赛默飞世尔科技（热电）Thermo Fisher Scientif

三代水柜的量产巅峰T-72坦: 作者：寞寒最近，西边闹腾挺大，本来小寞以为忙完这

通风柜跟实验室通风系统有: 说到通风柜跟实验室通风，不少人都纠结二者到底是不

集消毒杀菌、烘干收纳为一: 厨房是家里细菌较多的地方，潮湿的环境、没有完全密

实验室设备之全钢实验台如: 全钢实验台是实验室家具中较为重要的家具之一，很多

图片新闻

实验室药品柜的特性有哪些: 实验室药品柜是实验室家具的重要组成部分之一，主要

小学科学实验中有哪些教学: 计算机计算器一般打孔器打气筒仪器车显微镜

实验室各种仪器原理动图讲: 1.紫外分光光谱UV分析原理：吸收紫外光能量，引起分

高中化学常见仪器及实验装: 1、可加热仪器：2、计量仪器：（1）仪器A的名称：量

微生物操作主要设备和器具: 今天盘点一下微生物操作主要设备和器具，别嫌我啰嗦

浅谈通风柜使用基本常识: 　众所周知，通风柜功能中最主要的就是排气功能。在

6 Semidefinite optimization

6 Semidefinite optimization

今日新闻

点击排行

推荐新闻

图片新闻

专题文章